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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: LOWE , JOHN B. 

(ii) TITLE OF INVENTION: METHODS AND PRODUCTS FOR THE SYNTHESIS 
OF OLIGOSACCHARIDE STRUCTURES ON GLYCOPROTEINS, 
GLYCOLIPIDS, OR AS FREE MOLECULES, AND FOR THE ISOLATION 
OF CLONED GENETIC SEQUENCES THAT DETERMINE THESE STRUCTU 

(iii) NUMBER OF SEQUENCES: 14 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: OBLON, SPIVAK, MCCLELLAND, MAIER & NEUSTADT, 

:, P.C. 

i!g (B) STREET: 1755 Jefferson Davis Highway, Fourth Floor 

jj (C) CITY: Arlington 

(D) STATE: Virginia 

(E) COUNTRY: U.S.A. 

(F) ZIP: 22202 

COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 20-JUL-1992 

(C) CLASSIFICATION: 

(vlii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Lavalleye, Jean-Paul M. P. 

(B) REGISTRATION NUMBER: 31,451 

(C) REFERENCE /DOCKET NUMBER: 2363-060-55 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (703)521-4500 

(B) TELEFAX: (703)486-2347 

(C) TELEX: 248855 OPAT UR 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2043 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 
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(ii) MOLECULE TYPE: cDNA 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO:l: 



AGGAAACCTG 


CCATGGCCTC 


CTGGTGAGCT 


GTCCTCATCC 


ACTGCTCGCT 


GCCTCTCCAG 


60 


ATACTCTGAC 


CCATGGATCC 


CCTGGGTGCA 


GCCAAGCCAC 


AATGGCCATG 


GCGCCGCTGT 


120 


CTGGCCGCAC 


TGCTATTTCA 


GCTGCTGGTG 


GCTGTGTGTT 


TCTTCTCCTA 


CCTGCGTGTG 


180 


TCCCGAGACG 


ATGCCACTGG 


ATCCCCTAGG 


GCTCCCAGTG 


GGTCCTCCCG 


ACAGGACACC 


24C 


ACTCCCACCC 


GCCCCACCCT 


CCTGATCCTG 


CTATGGACAT 


GGCCTTTCCA 


CATCCCTGTG 


30C 


GCTCTGTCCC 


G CTGTT C AG A 


GATGGTGCCC 


GGCACAGCCG 


ACTGCCACAT 


CACTGCCGAC 


36C 


CGCAAGGTGT 


ACCCACAGGC 


AGACACGGTC 


ATCGTGCACC 


ACTGGGATAT 


CATGTCCAAC 


42C 


CCTAAGTCAC 


GCCTCCCACC 


TTCCCCGAGG 


CCGCAGGGGC 


AGCGCTGGAT 


CTGGTTCAAC 


48C 


TT G GAG C C AC" 


CCCCTAACTG_ 


CCAGCACCTG- 


GAAGCCCTGG- 


-ACAGATACTT— CAATCTCACC 


54C 


ATGTCCTACC 


GCAGCGACTC 


CEACATCTTC 


ACGCCCTACG 


GCTGGCTGGA 


GCCGTGGTCC 


60( 


GGCCAGCCTG 


CCCACCCACC 


GCTCAACCTC 


TCGGCCAAGA 


CCGAGCTGGT 


GGCCTGGGCG 


66( 


GTGTCCAACT 


GGAAGCCGGA 


CTCAGCCAGG 


GTGCGCTACT 


ACCAGAGCCT 


GCAGGCTCAT 


72< 


CTCAAGGTGG 


ACGTGTACGG 


ACGCTCCCAC 


AAGCCCCTGC 


CCAAGGGGAC 


CATGATGGAG 


78< 


ACGCTGTCCC 


GGTACAAGTT 


CTACCTGGCC 


TTCGAGAACT 


CCTTGCACCC 


CGACTACATC 


841 


ACCGAGAAGC 


TGTGGAGGAA 


CGCCCTGGAG 


GCCTGGGCCG 


TGCCCGTGGT 


GCTGGGCCCC 


90< 


AGCAGAAGCA 


ACTACGAGAG 


GTTCCTGCCA 


CCCGACGCCT 


TCATCCACGT 


GGACGACTTC 


96< 


CAGAGCCCCA 


AGGACCTGGC 


CCGGTACCTG 


CAGG AG CT G G" ACAAGGACCA CGCCCGCTAXT 


102* 


CTGAGCTACT 


TTCGCTGGCG 


GGAGACGCTG 


CGGCCTCGCT 


CCTTCAGCTG 


GGCACTGGAT 


108 


TTCTGCAAGG 


CCTGCTGGAA 


ACTGCAGCAG 


GAATCCAGGT 


ACCAGACGGT 


GCGCAGCATA 


114 


GCGGCTTGGT 


TCACCTGAGA 


GGCCGGCATG 


GTGCCTGGGC 


TGCCGGGAAC 


CTCATCTGCC 


120 


TGGGGCCTCA 


CCTGCTGGAG 


TCCTTTGTGG 


CCAACCCTCT 


CTCTTACCTG 


GGACCTCACA 


126 


CGCTGGGCTT 


CACGGCTGCC 


AGGAGCCTCT 


CCCCTCCAGA 


AGACTTGCCT 


GCTAGGGACC 


132 
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TCGCCTGCTG 


GGGACCTCGC 


CTGTTGGGGA 


CCTCACCTGC 


TGGGGACCTC 


ACCTGCTGGG 


138C 


GACCTTGGCT 


GCTGGAGGCT 


GCACCTACTG 


AGGATGTCGG 


CGGTCGGGGA 


CTTTACCTGC 


144< 


TGGGACCTGC 


TCCCAGAGAC 


CTTGCCACAC 


TGAATCTCAC 


CTGCTGGGGA 


CCTCACCCTG 


1501 


GAGGGCCCTG 


GGCCCTGGGG 


AACTGGCTTA 


CTTGGGGCCC 


CACCCGGGAG 


TGATGGTTCT 


156« 


GGCTGATTTG 


TTTGTGATGT 


TGTTAGCCGC 


CTGTGAGGGG 


TG CAG AG AG A 


TCATCACGGC 


1621 


ACGGTTTCCA 


GATGTAATAC 


TGCAAGGAAA 


AATGATGACG 


TGTCTCCTCA 


CTCTAGAGGG 


1681 


GTTGGTCCCA 


TGGGTTAAGA 


GCTCACCCCA 


GGTTCTCACC 


TCAGGGGTTA 


AGAGCTCAGA 


174< 


GTTCAGACAG 


GTCCAAGTTC 


AAGCCCAGGA 


CCACCACTTA 


TAGGGTACAG 


GTGGGATCGA 


180* 


CTGTAAATGA 


GGACTTCTGG 


AACATTCCAA 


ATATTCTGGG 


GTTGAGGGAA 


ATTGCTGCTG 


186 


TCTACAAAAT 


GCCAAGGGTG 


GACAGGCGCT 


GTGGCTCACG 


CCTGTAATTC 


CAGCACTTTG 


192 


GGAGGCTGAG 


GTAGGAGGAT 


TGATTGAGGC 


CAAGAGTTAA 


AG AC CAG C CT 


GGTCAATATA 


198 


GCAAGACCAC 


GTCTCTAAAT 


AAAAAATAAT 


AGGCCGGCCA 


GGAAAAAAAA 


AAAAAAAAAA 


204 


AAA 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 61 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asp Pro Leu Gly Ala Ala Lys Pro Gin Trp Pro Trp Arg Arg Cys 
15 10 15 

Leu Ala Ala Leu Leu Phe Gin Leu Leu Val Ala Val Cys Phe Phe Ser 
20 25 30 

Tyr Leu Arg Val Ser Arg Asp Asp Ala Thr- Gly Ser Pro Arg Ala Pro 
35 40 45 

Ser Gly Ser Ser Arg Gin Asp Thr Thr Pro Thr Arg Pro Thr L u Leu 
50 55 6 o 
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Ile Leu Leu Trp Thr Trp Pro Phe His lie Pro Val Ala Leu Ser Arg 
65 70 75 80 

Cys Ser Glu Met Val Pro Gly Thr Ala Asp Cys His lie Thr Ala Asp 

85 90 95 

Arg Lys Val Tyr Pro Gin Ala Asp Thr Val lie Val His His Trp Asp 
100 105 110 

lie Met Ser Asn Pro Lys Ser Arg Leu Pro Pro Ser Pro Arg Pro Gin 
115 120 125 

Gly Gin Arg Trp lie Trp Phe Asn Leu Glu Pro Pro Pro Asn Cys Gin 
130 135 140 

His Leu Glu Ala Leu Asp Arg Tyr Phe Asn Leu Thr Met Ser Tyr Arg 
!45 150 155 160 

ser Asp Ser Asp lie Phe Thr Pro Tyr Gly Trp Leu Glu Pro Trp Ser 

165 170 175 



Gly Gin Pro Ala His Pro Pro Leu Asn Leu Ser Ala Lys Thr Glu Leu 
"c 180 185 190 



;q Val Ala Trp Ala Val Ser Asn Trp Lys Pro Asp Ser Ala Arg Val Arg 

195 200 205 



Tyr Tyr Gin Ser Leu Gin Ala His Leu Lys Val Asp Val Tyr Gly Arg 
210 215 220 

Ser His Lys Pro Leu Pro Lys Gly Thr Met Met Glu Thr Leu Ser Arg 
225 230 235 240 

Tyr Lys Phe Tyr Leu Ala Phe Glu Asn Ser Leu His Pro Asp Tyr lie 

245 250 255 

Thr Glu Lys Leu Trp Arg Asn Ala Leu Glu Ala Trp Ala Val Pr Val 
260 265 270 

Val Leu Gly Pro Ser Arg Ser Asn Tyr Glu Arg Phe Leu Pro Pro Asp 
275 280 285 

Ala Phe lie His Val Asp Asp Phe Gin Ser Pro Lys Asp Leu Ala Arg 
290 295, 300 

Tyr Leu Gin Glu Leu Asp Lys Asp His Ala Arg Tyr Leu Ser Tyr Phe 
305 310 315 320 

Arg Trp Arg Glu Thr Leu Arg Pro Arg Ser Phe Ser Trp Ala Leu Asp 

325 330 335 



q 
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Phe cys Lys Ala cys Trp Lys Leu Gin Gin Glu S r Arg Tyr Gin Thr 
340 345 350 

Val Arg Ser lie Ala Ala Trp Phe Thr 
355 360 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1500 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: cDNA 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

| CCTTCCCTTG TAGACTCTTC TTGGAATGAG AAGTACCGAT TCTGCTGAAG ACCTCGCGCT 6( 

^ CTCAGGCTCT GGGAGTTGGA ACCCTGTACC TTCCTTTCCT CTGCTGAGCC CTGCCTCCTT 12l~ 

Lj AGGCAGGCCA GAGCTCGACA GAACTCGGTT GCTTTGCTGT TTGCTTTGGA GGGAACACAG 181 

;y CTGACGATGA GGCTGACTTT GAACTCAAGA GATCTGCTTA CCCCAGTCTC CTGGAATTAA 24C 

I AGGCCTGTAC TACATTTGCC TGGACCTAAG ATTTTCATGA TCACTATGCT TCAAGATCTC 30C 

4 CATGTCAACA AGATCTCCAT GTCAAGATCC AAGTCAGAAA CAAGTCTTCC ATCCTCAAGA 36< 

TCTGGATCAC AGGAGAAAAT AATGAATGTC AAGGGAAAAG TAATCCTGTT GATGCTGATT 42< 

GTCTCAACCG TGGTTGTCGT GTTTTGGGAA TATGTCAACA GAATTCCAGA GGTTGGTGAG 48( 

AACAGATGGC AGAAGGACTG GTGGTTCCCA AGCTGGTTTA AAAATGGGAC CCACAGTTAT 54< 

CAAGAAGACA ACGTAGAAGG ACGGAGAGAA AAGGGTAGAA ATGGAGATCG CATTGAAGAG 60( 

CCTCAGCTAT GGGACTGGTT CAATCCAAAG AACCGCCCGG ATGTTTTGAC AGTGACCCCG 66( 

TGGAAGGCGC CGATTGTGTG GGAAGGCACT TATGACACAG CTCTGCTGGA AAAGTACTAC 72( 

GCCACACAGA AACTCACTGT GGGGCTGACA GTGTTTGCTG TGGGAAAGTA CATTGAGCAT 781 

TACTTAGAAG ACTTTCTGGA GTCTGCTGAC ATGTACTTCA TGGTTGGCCA TCGGGTCATA 84( 

TTTTACGTCA TGATAGACGA CACCTCCCGG ATGCCTGTCG TGCACCTGAA CCCTCTACAT 901 
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TCCTTACAAG 


TCTTTGAGAT 


CAGGTCTGAG 


AAGAGGTGGC 


AGGATATCAG 


CATGATGCGC 


96( 


ATGAAGACCA 


TTGGGGAGCA 


CATCCTGGCC 


CACATCCAGC 


ACGAGGTCGA 


CTTCCTCTTC 


102< 


TGCATGGACG 


TGGATCAAGT 


CTTTCAAGAC 


AACTTCGGGG 


TGGAAACTCT 


GGGCCAGCTG 


108< 


GT AG C AC AG C 


TCCAGGCCTG 


GTGGTACAAG 


GCCAGTCCCG 


AGAAGTTCAC 


CTATGAGAGG 


114* 


CGGGAACTGT 


CGGCCGCGTA 


CATTCCATTC 


GGAGAGGGGG 


ATTTTTACTA 


CCACGCGGCC 


120' 


ATTTTTGGAG 


GAACGCCTAC 


T C AC ATT CT C 


AACCTCACCA 


GGGAGTGCTT 


TAAGGGGATC 


126* 


CTCCAGGACA 


AGAAACATGA 


CATAGAAGCC 


CAGTGGCATG 


ATGAGAGCCA 


CCTCAACAAA 


132i 


TACTTCCTTT 


TCAACAAACC 


CACTAAAATC 


CTATCTCCAG 


AGTATTGCTG 


GGACTATCAG 


138 


ATAGGCCTGC 


CTT C AG AT AT 


TAAAAGTGTC 


AAGGT AG CTT 


GGCAGACAAA 


AGAGTATAAT 


144 


TTGGTTAGAA 


AT AATGT CTG 


ACTTCAAATT 


GTGATGGAAA 


CTTGACACTA 


TTTCTAACCA 


150 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A)_LEN.GTH:_3.9-4_amino- acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met lie Thr Met Leu Gin Asp Leu His Val Asn Lys lie Ser Met Ser 
15 10 15 

Arg Ser Lys Ser Glu Thr Ser Leu Pro Ser Ser Arg Ser Gly Ser Gin 
20 25 30 

Glu Lys lie Met Asn Val Lys Gly Lys Val lie Leu Leu Met Leu lie 
35 40 45 

Val Ser Thr Val Val Val Val Phe Trp Glu Tyr Val Asn Arg 11 Pro 
50 55 60 

Glu Val Gly Glu Asn Arg Trp Gin Lys Asp Trp Trp Phe Pro Ser Trp 
65 70 75 80 

Phe Lys Asn Gly Thr His Ser Tyr Gin Glu Asp Asn Val Glu Gly Arg 

85 90 95 



uy 
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Arg Glu Lys Gly Arg Asn Gly Asp Arg lie Glu Glu Pro Gin Leu Trp 
100 105 110 

Asp Trp Phe Asn Pro Lys Asn Arg Pro Asp Val Leu Thr Val Thr Pro 
115 120 125 

Trp Lys Ala Pro lie Val Trp Glu Gly Thr Tyr Asp Thr Ala Leu Leu 
130 135 140 

Glu Lys Tyr Tyr Ala Thr Gin Lys Leu Thr Val Gly Leu Thr Val Phe 
145 150 155 160 

Ala Val Gly Lys Tyr lie Glu His Tyr Leu Glu Asp Phe Leu Glu Ser 

165 170 175 

Ala Asp Met Tyr Phe Met Val Gly His Arg Val lie Phe Tyr Val Met 
180 185 190 

lie Asp Asp Thr Ser Arg Met Pro Val Val His Leu Asn Pro Leu His 
195 200 205 

Ser Leu Gin Val Phe Glu lie Arg Ser Glu Lys Arg Trp Gin Asp lie 
210 215 220 

Se r -Met— Met— A rg Met L ys Thr lie Gly Glu His lie Leu Ala His _Ile_ 
225 230 235 240 

Gin His Glu Val Asp Phe Leu Phe Cys Met Asp Val Asp Gin Val Phe 

245 250 255 

Gin Asp Asn Phe Gly Val Glu Thr Leu Gly Gin Leu Val Ala Gin Leu 
260 265 270 

Gin Ala Trp Trp Tyr Lys Ala Ser Pro Glu Lys Phe Thr Tyr Glu Arg 
275 280 285 

Arg Glu Leu Ser Ala Ala Tyr lie Pro Phe Gly Glu Gly Asp Phe Tyr 
290 295 300 

Tyr His Ala Ala lie Phe Gly Gly Thr Pro Thr His lie Leu Asn_Leu 
305 310 * " " 315 ' 320 

Thr Arg Glu Cys Phe Lys Gly lie Leu Gin Asp Lys Lys His Asp 11 

325 330 335 

Glu Ala Gin Trp His Asp Glu Ser His Leu Asn Lys Tyr Phe Leu Phe 
340 345 350 

Asn Lys Pro Thr Lys lie Leu Ser Pro Glu Tyr Cys Trp Asp Tyr Gin 
355 360 365 
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Ile Gly Leu Pro Ser Asp lie Lys Ser Val Lys Val Ala Trp Gin Thr 
370 375 380 

Lys Glu Tyr Asn Leu Val Arg Asn Asn Val 
385 390 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8174 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

G AATT C CAT C GTGGCAAGGG CAGCCTGAAT GGATGATGTA ACCTGGGGTC CTTTCAATGG 6C 

AGGGCCAGAC— TCCTGGGTCT AGGGGATGAG -GGAGGGGAGG ATCGGGTTAG CTGGGACCCA 12C 

GGTGAAAGGG GCTGGGGGCC CACATTCCTG AGTCTCAGAG AGAAGGATCT GGGGTCTCAA 18C 

GCACCTGAGT CGGAGGGAGG AGGGGTGCTG GGCTCCTGGA AAAACCACCT CTTGGACCAT 24C 

CTATGCAGAT CACGCAGAAC AAGAGAAATT TCTGCGCCCC ATCTGAATTT CTAAGTTTGG 30< 

GGGGAGGGCG TGATCTGACA CTGAGGTTCC TTGATCCTCA GCAAGGCGGC AATTGCTGTA 36< 

TGAAAGAAGC G ACCG CAT CT GAGACACAAG TATCCTGCCT TGGAAGCCTC TCACCTGGCC 421 

GTGGGCCAAC CTCAACCTCA TCTGTCCCTG CTCAGATGCT CAGACCCTGG ACATCCCAGC 481 

CTCCTCCTCC CTGATGCAAT CCTGGTGTTT CTTTCACCAG AG AAG C CAT C CCAGGCCCAG 54< 

GCAGGTGCTC CTGAAATAAC CTGGGGGGAG GGGTGGCTGA AAGTCCCTGA CTGGAGTTGG 60- 

CAGCCAAGCC AGGCCCTGGA GTGGGCACCC AGAGGGAAGA CAGGTTGGCT AATTTCCTGG 66 

AGCCCCTAAG GGTGCAAGGG TAGGCCTTCT GTGTCTGAGG GAGGAGGGCT GGGGCTCTGG 72 

ACTCCTGGGT CTGAGGGAGG AGGGGTGGGG GGCCTGGACT CCTGGGTCTG AGGGAGGAGG 78 

GTCTGGGCCT GTACTCCTGG ATCTGAGGGA GGAGGGGCTG GGGAACTTGG GCTCCTGGGT 84 

CTGAGGGAGG AGGGAGCTTT GGTCTGGACT CCTGGGTCTG AGGGAGTAGG GGCTAGGGAT 90 
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CTGGACTCGT 


GGGTGTGAGG AAGGAGGGGC TGGGGTCCTG GACTCCTGGG TCTGAGGAAG 


96C 




GAGGGGCAGG 


GGGCTTGGAC TCCTGGGTCT GAGGAAGGAG GGGCCGGGAG CCTGGACTCC 


102C 




TAAGTCTGAG 


GGAGGAGGGT CTGGGGGCCT GGACTGCTGG GTGTGAGCAG AAGGGTCTGG 


108C 




GTG CTGGG AG 


TCCCGAGCCT GGGGAGATGA TGGTTAAACT TCTGGGAATC AAGT C AAACT 


114C 




CCTGAGTCTT 


TGACATTGAT GTATCTTGAA TGGGAGGGTC AGTCTGTGGG GAAGGATTAC 


120C 




CCAGGTGCCG 


AGGCAAGAGA CTGAAGGCAC AAACTGTTTC AGTATAATAA AGAAAATAGT 


126C 




TAGAATAAGA 


ATAGTTATCA TACAAATTAG ATATAGAGAT GATCATGGAC AGTATCAATC 


132C 




ATTAGTGTAA 


ACATTATTAA T C ATT AG CT A TTACTTTTAT T CTTTGTTGT ATAACTAATA 


138( 




TAACCAGGAA 


ACAACCGGTG GGTATAGGGT CAGGTACTGA AGGGACATTG TGAGAAGTGA 


144< 




CCTAGAAGGC 


AAGAGGTGAG CCTTCTGTCA CACCGGCATA AGGGCCTCTT GAGGGCTCCT 


150c 


; . | 
=? 


TGGTCAAGCG 


GGAACGCCAG TGTCTGGGAA GGCACCCGTT ACTCAGCAGA CCACGAAAGG 


156< 


GAATCTCCTT 


TTCTTGGAGG AGTCAGGGAA CACTCTGCTC CACCAGCTTC TTGTGGGAGG 


162( 


! -. i 

UM' 


CTGGGTATTA 


iTCTAGG.CCTG-CCCGCAaTXZA-TCCTCCTGTG^T 


168< 




TCCTTGTCCT 


CTTGCATTTT CCTCCCGTAC TCCTGGTTCC T CTTTG AAGT TCGTAGTAGA 


174( 


" ? 

*y 


TAGCGGTAGA 


AGAAATAGTG AAAGCCTTTT TTTTTTTTTT TTTGAGGCGG AGTCTCGCTC 


180( 




TGTCCCCCAG 


GCTGGAGTGC AGTGGCGTGA TCTCGGCTCA CTGCAATCTC CGCCTCCTGG 


186« 




GTTCACACCA 


TTCTCCTGCC TCACCCTCCC AAATAGCTAG GACTACAGGC GCCCTCCACC 


192- 




ACGCGCCCGG 


ATAATTTTTT GTATTTTTAG TAGAGACAGG GTTTCACCGT GTTAGCCAGG 


198 




ATGGCCTCCA 


CCTCCTGACC TTGTGATCCG CCCGCCTCAG CCTCCCAAAG TG CTGGG ATT 


204 




ACAGGCGTGA 


GCCACCGCGC CCGCCCGAAA TAGTGAAAGT CTTAAAGTCT TTGATCTTTC 


210 




TTATAAGTGC 


AGAGAAGAAA ACGCTGACAT ATGCTGCCTT CTCTTTCTGC TTCGGCTGCC 


216 




TAAAAGGGAA 


GGGCCCCCTG TCCCATGATC ACGTGACTTG CTTGACCTTA TCAGTCATTT 


222 




GGACGACTCA 


CCCTCCTTAT CCTGCCCCCC CTTGTCTTGT ATACAATAAA TATCAGCGCG 


228 




CCCAGCCATT 


CGGGGCCACT ACCGGTCTCT GCGTCTTGAT GGTAGTGGTC CCCCGGGCCC 


234 




AGCTGTTTTC 


TCTTTATCTC TTTGTCTTGT GTCTTTATTT CTTACAATCT CTCCTCTCCT 


240 




CACAGGGGAA 


GAACACCCAC CCGCAAAGCC CCGTAGGGCT GGACCCTACG TTAGCCTGCC 


246 
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CTGCTCGGGG 


TTGGCGATGC 


TGGAGGTGGG 


CCTTGGACCA 


G AG AAAATG C ' 


TTTAATTAGG 


2520 




TGACAAGCGG 


GCAGAGGCCT 


TTGTCTCTGG 


CGCCGGCAGC 


CACGGCCCCC 


GCTGACGGCG 


2580 




TGGGAAACAG 


ACCCTGTTCC 


ACTCCGGTCT 


CCAGCCTTGG 


AATGGTTGCC 


TTCGTGCAGT 


2640 




GCAGGTCTGG 


AAAGTAGCAG 


TTTGGCACGG 


GACCCTAGAA 


TTCCCCAAAA 


GGAGTGACTA 


2700 




GGGGCTGGGA 


TTCTGGAATT 


TGAGTGTGGA 


CGGTGAGGCG 


GGGGGTGTGG 


GAGATCGGAG 


2760 




ACCCTGGTGG 


GCGCGGGAGC 


ACCTGCAGGC 


TGGAGGCCCT 


CGCGCGCTCC 


GGCGGCAGCC 


2820 




TGGGAAACAG 


GTTCTCCATC 


CCCCAGGAGG 


ACGCGGCAGA 


GGGCGGACGA 


TCGCTCCACT 


2880 




CGCCGGGACC 


AGGTGCGGGG 


GCCCTGCCCA 


GCCGCTGGGG 


CGTGGCCAGG 


CTCGAAGCAC 


294C 




CCAGGTGTCG 


GGGGCCGACT 


CTAAGCCCTG 


GCACCGGAAG 


AGAGAGGGCG 


GCGGATTGGA 


300C 




CCTCCCGGCT 


CCAGCATTGC 


AACTGGGCGC 


TCCGTCTCCT 


GGTCCACGCA 


ATGATGCTGC 


306C 


r==J 

;n 


GGCTGCTCAG 


AAGCCAGGTA 


GCCTGCCCTG 


GGTGAAGCCT 


TCGCGCAGGT 


CAATGACGGG 


312C 




GCGGAGGGGC 


AGGGCGCGGT 


CCCCTGCATC 


CCCGATCTGG 


GGAGCGGTGG 


GCCCAGGGGC 


318C 




CATCGCCTTA 


GCCCCTGGCG 


CTGGGGCTCG 


GCGCCAAGTG 


ACGGGCGGGG 


CT C C AC CTTG — 


— 324C— 




CAGCCATCCG 


CCCGGCCCGG 


GAGGGCGGAC 


GCTGCGAGAC 


TCCCGGCCGC 


GCCCTCTCCT 


330C 


u 


TCCTCTCCTC 


CCCAAGCCCT 


CGCTGCCAGT 


CCGGACAGGC 


TGCGCGGAGG 


GGAGGGCTGC 


336C 




CGGGCCGGAT 


AGCCGGACGC 


CTGGCGTTCC 


AGGGGCGGCC 


GGATGTGGCC 


TGCCTTTGCG 


342C 




GAGGGTGCGC 


TCCGGCCACG 


AAAAGCGGAC 


TGTGGATCTG 


CCACCTGCAA 


G C AG CTCGGG 


348C 




TAAGTGGGGA 


CTGCCCCACT 


CAGTTGTTCC 


TGGGACCCAG 


GAACAACTCC 


TTCAGAACCA 


354( 




GGAGGTGCAC 


CCCCAACCTC 


TTCTCCAGGT 


CTTCCTAAGG 


CCCTAGGAAT 


CTCCGCCACC 


360( 




TCCCCAGCCA 


TTACTCCTCC 


AGGAACCAAG 


ATGCTCCTTC 


CGCTCCTGAC 


CCTCCAGCCT 


366( 




CTCTTGTTTT 


ACTTGAACTA TCGTTTCCCA 


TCACCACCTC 


"TGTGGTGGAT 


TTTGCGCCTC 


3721 




ACAGACAGGT 


ACTCCTGAGA AACAGGCTGG 


TGGAAGAGTC 


CAGTATCAGC 


GGAACTTACA 


378< 




GGAGGGGAGA 


CTCGAGATTC 


CTTCAGGAAA 


GGTGTAGGAA 


CCTGGACCAC 


TTTCTTTTTT 


384i 






TTTTTTTAAG 


ACAGGGTCCC 


TCTCTGTCGC 


GCAAGCTGGA 


GTGCAGTCAG 


390< 




CGGTGCTATC 


GCGGCTCATT 


GTGAGCTCCG 


GGGATCCTCC 


CGCCTTAGCA 


TCCGGTGTAG 


396< 




CTGAGACCAC 


AGACATGTGC 


CACCATGCCA 


AG CT AATTTT 


ATTTATTTTT 


TTTTGGAGAC 


402 
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GGAGTTTCAC TCTTGTTGCC CAGGCTGGAG TGTAATGGCA TGATCTCAGC TCACCGCAAC 408C 

TCCCGCCCCC CGGGTTCAGG CGATTCTCCT GCCTCAGCCT CCCGAGTGGC TGGGATTACA 414C 

GGCATGCGCC ACCATGCCCG GCTAATTTTG TATTTTAAGT AGAGACAGGG TTTCTCCACG 420C 

TTGGTCAGGC TGGTCTCGAA CTCCCAACCT CAGGTGATCC ACCCACCTTG GCCTCCCAAA 426C 

GTGCTGGGAT TACAGGTGTG AGCCACCGCG CCTGGCCCAT GCCAAGCTAA TTTTAAAATT 432C 

TTTTTGTAAG AGTGCTCTGT TGCCCAGGCT GATCTTGAAC TCCTGGGCTC AAGGGATCCT 438C 

CCCATCTCAG CCTCCCAATA TGCTGGGATT ACAGGTGTGA GCCACAGTGC CCAGCCAAAC 444C 

CATGGCTATC TTG AAAAC C A CTTGTCTTCC AGTCCCCATG CCCCGAAATT CCAAGGCTCT 450C 

CATCCCTGAA ACCTAGGACT CAGGCTCTCC CTACCTCAGC CCCAGGAGTC TAAACCTTTA 456C 

% ACTTCCTCTT TCCCTGGGAC TAAGGAGTGC TGCACCCCAG GCGCCTCCCT TACCCCACAT 462C 

CCCTCCTCAG CCTCCCCTCC TCAGCCTCAG TGCATTTGCT AATTCGCCTT TCCTCCCCTG 468C 

^ CAGCCATGTG GCTCCGGAGC CATCGTCAGC TCTGCCTGGC CTTCCTGCTA GTCTGTGTCC 474C 

N TCTCTG TAAT CTTCTT CCTC CATATCCATC. AAGACAGCTT— T-CCACATGGC_GTAGGCCTGT A8.0.C 

^ CGATCCTGTG TCCAGACCGC CGCCTGGTGA CACCCCCAGT GGCCATCTTC TGCCTGCCGG 486C 

i=n GTACTGCGAT GGGCCCCAAC GCCTCCTCTT CCTGTCCCCA GCACCCTGCT TCCCTCTCCG 49 2 C 

: y 

. 8 j? GCACCTGGAC TGTCTACCCC AATGGCCGGT TTGGTAATCA GATGGGACAG TATGCCACGC 498C 

j.4 TGCTGGCTCT GGCCCAGCTC AACGGCCGCC GGGCCTTTAT CCTGCCTGCC ATGCATGCCG 504C 

CCCTGGCCCC GGTATTCCGC ATCACCCTGC CCGTGCTGGC CCCAGAAGTG GACAGCCGCA 5 IOC 

CGCCGTGGCG GGAGCTGCAG CTTCACGACT GGATGTCGGA GGAGTACGCG GACTTGAGAG 516C 

ATCCTTTCCT GAAGCTCTCT GGCTTCCCCT GCTCTTGGAC TTTCTTCCAC CATCTCCGGG 522 C 

AACAGATCCG CAGAGAGTTC ACCCTGCACG ACCACCTTCG GGAAGAGGCG CAGAGTGTGC" 528C 

TGGGTCAGCT CCGCCTGGGC CGCACAGGGG ACCGCCCGCG CACCTTTGTC GGCGTCCACG 534C 

TGCGCCGTGG GGACTATCTG CAGGTTATGC CTCAGCGCTG GAAGGGTGTG GTGGGCGACA 54 OC 

GCGCCTACCT CCGGCAGGCC ATGGACTGGT TCCGGGCACG GCACGAAGCC CCCGTTTTCG 546C 

TGGTCACCAG CAACGGCATG GAGTGGTGTA AAGAAAACAT CGACACCTCC CAGGGCGATG 552 C 

TGACGTTTGC TGGCGATGGA CAGGAGGCTA CACCGTGGAA AGACTTTGCC CTGCTCACAC 558 ( 
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AGTGCAACCA CACCATTATG ACCATTGGCA CCTTCGGCTT CTGGGCTGCC TACCTGGCTG 564C 

GCGGAGACAC TGTCTACCTG GCCAACTTCA CCCTGCCAGA CTCTGAGTTC CTGAAGATCT 57 OC 

TTAAGCCGGA GGCGGCCTTC CTGCCCGAGT GGGTGGG CAT TAATGCAGAC TTGTCTCCAC S76C 

TCTGGACATT GGCTAAGCCT TGAGAGCCAG GGAGACTTTC TGAAGTAGCC TGATCTTTCT 582C 

AG AG C C AG C A GTACGTGGCT TCAGAGGCCT GGCATCTTCT GGAGAAGCTT GTGGTGTTCC 588C 

TGAAGCAAAT GGGTGCCCGT ATCCAGAGTG ATTCTAGTTG GGAGAGTTGG AGAGAAGGGG 594C 

GACGTTTCTG GAACTGTCTG AATATTCTAG AACTAGCAAA ACATCTTTTC CTGATGGCTG 600C 

GCAGGCAGTT CTAGAAGCCA CAGTGCCCAC CTGCTCTTCC CAGCCCATAT CTACAGTACT 606C 

TCCAGATGGC TGCCCCCAGG AATGGGGAAC TCTCCCTCTG GTCTACTCTA GAAGAGGGGT 612C 

TACTTCTCCC CTGGGTCCTC CAAAGACTGA AGGAGCATAT GATTGCTCCA GAGCAAGCAT 618C 

TCACCAAGTC CCCTTCTGTG TTTCTGGAGT GATTCTAGAG GGAGACTTGT TCTAGAGAGG 624C 

ACCAGGTTTG ATGCCTGTGA AGAACCCTGC AGGGCCCTTA TGGACAGGAT GGGGTTCTGG 63 OC 

AAATCC AGAT AACTAAGGTG .AAGAAXCTTT—TTAGTTTTTT -TTTTTTTTTT— T.TGGAGACAG 636C__ 

GGTCTCGCTC TGTTGCCCAG GCTGGAGTGC AGTGG CGTG A TCTTGGCTCA CTGCAACTTC 642C 

CGCCTCCTGT GTTCAAGCGA TTCTCCTGTC TCAGCCTCCT GAGTAGATGG GACTACAGGC 648C 

ACAGGCCATT ATGCCTGGCT AATTTTTGTA TTTTTAGTAG AGACAGGGTT TCACCATGTT 654C 

GGCCGGGATG GTCTCGATCT CCTGACCTTG TCATCCACCT GTCTTGGCCT CCCAAAGTGC 660C 

TGGGATTACT GGCATGAGCC ACTGTGCCCA GCCCGGATAT TTTTTTTTAA TTATTTATTT 666C 

ATTTATTTAT TTATTGAGAC GGAGTCTTGC TCTGTAGCCC AGGCCAGAGT GCAGTGGCGC 67 2 C 

GATCTCAGCT CACTGCAAGC TCTGCCTCCC GGGTTCATGC CATTCTGCCT CAGCCTCCTG 67 8 C 

AGTAGCTGGG ACTACAGGCG CCCGCCACCA CGCCCGGCTA ATTTTTTTTG TATTTTTAGT 684C 

AGAGACGGGG TTTCATCGTG TTAACCAGGA TGGTCTCGAT CTCCTGACCT CGTGATCTGC 6901 

CCACCTCGGC CTCCCACAGT GCTGGGATTA CCGGCGTGAG CCACCATGCC TGGCCCGGAT 6961 

AATTTTTTTT AATTTTTGTA GAGACGAGGT CTTGTGATAT TGCCCAGGCT GTTCTTCAAC 702 1 

TCCTGGGCTC AAGCAGTCCT CCCACCTTGG CCTCCCAGAA TGCTGGGTTT ATAGATGTGA 708i 

GCCAGCACAC CGGGCCAAGT GAAGAATCTA ATGAATGTGC AACCTAATTG TAGCATCTAA 714» 
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TGAATGTTCC 


** w X X WW X w 




TV <"P/"* ^ TV TV TV TV O 

wA x GGAAAAG 


ta tv tv r*r* TV TV *M»f* 
AAAvwAl til 


1 ALj 1TGGCCA 


~7 f\ r 


GCGTCTTGCT 




<TT/ ** 1 T^HP/^ ^ TV TV TV 

1 w 1 CTGGAAA 


AGCTGGGGTA 


GTTGG 1 VjACiw 


tv tv r+r* TV ^> 

AGAGCGGGAC 


^ ^ ^ ^ 
726C 


TCTGTCCAAC 


rt-M wJ w w w W>1 W A 


bCLCCTCAAA 




TGTTTGTTTT 


TV ^ ^TV ^ TV TV ^« 

GAG CAGACAG 


732C 


G CT AAAATGT 




TV* TV/"f , /'^TV r P/ , ^TV 


f>rry^ r*»r+ tv tv tv %m 

CTGCCAAAAT 


GGTACAGul T 


t~irT>r* TV ^ ^» TV ^ TA 

tTGGAGCAGA 


-7 -j o 

738C 


ACTTTCCAGG 


G A T r* f* A ft H 


LAL1 1 1 1 1 I 1 


T AAAG CT CAT 


TV X TV OHI^ TV TV 

AAACTGCCAA 


GAG CTCCATA 


"7 A A r 


TATTGGGTGT 


«AVa X X Lnob X 


IGCCTCTCAC 


TV TV TV TV TV TV 

AATGAAGGAA 


GTTGGT CTTT 


GTCTGCAGGT 


750C 


GGGCTGCTGA 


«V3w X w X vjVjua 


^^^^^^^ ^^^^^y^^^^^y s^^^^y 
ILlGirTTCT 


^^tv ta /»m/ »m^^ 

GGAAGTGTGC 


tv ^mTt rri tv tv tv /""» 

AGGTATAAAC 


ACACCCTCTG 


75ot 


TGCTTGTGAP 




blACCGTGCT 


C ATTG CTAAC 


CACTGTCTGT 


C C CTGAACTC 


762C 


CCAGAACCAC 


TACATCTGGC 


TTTGGGCAGG 


TCTGAGATAA 


AACGATCTAA 


AGGTAGGCAG 


768C 


ACCCTGGACC 


C'h.ciccTr*'&r m tv 

LrtbLL, X LAbA 


rcCAGGCAGG 


AGCACGAGGT 


^ *l 1 1 ^ TV TV ^ ^ 

CTGGCCAAGG 


TGGACGGGGT 


774C 


TGTCGAGATC 


x wnuunuwwL 


w 1 x GCTGTTT 


TTTGGAGGGT 


TV TV TV TV TV ^* TV TV 

GAAAGAAGAA 


TV y*^^**f WT1TV TV TV TV 

AC CTTAAACA 


780C 


TAGTCAGCTC 


X wr\ x V>nUA X w 


w wC xGTCTAC 


T CAT C C AG AC 


CCCATGCCTG 


m tv f** urn tv ^n^* 

TAGGCTTATC 


786C 


nuuunu X x aw 


A H 7A TV T^P 
^ivj.X lALAAi X 


GTTACAGTAC 


TGTTCCCAAC 


-TCAGCTGCCA 


CGGGTGAGAG 


792C 


7V/~/' , 7V/'*/'"IVfr*'P 

At* LAb Cj A w GT 


ATGAATTAAA 


AGTCTACAGC 


ACTAACCCGT 


GTCTCTGTAG 


CTTTTTTGGA 


798C 


GCCAGAGCCA 


CTGTGTATGT 


GTGTGTGGGT 


TTGTGTGTGT 


GTGTGTGTGT 


GTGTGTGTGT 


804C 


AAGAGAGTGG 


AGGAAAAGGT 


GGGGTACTTC 


TGAAGACTTT 




TAATTAATTT 


810C 


ATTTTTTTTC 


AGAGATCGAG 


TCTTGCTCTG 


TGGCCCAGGC 


TGGAGTGCAG 


TAGTGTGATC 


816t 


TCGGCCCACT 


GCAA 










817* 


(2) INFORMATION FOR SEQ ID NO: 6: 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Trp Leu Arg Ser His Arg Gin Leu Cys Leu Ala Phe Leu Leu Val 
15 10 15 



.a 
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Cys Val Leu Ser Val lie Phe Phe Leu His lie His Gin Asp ser Phe 
20 25 30 

Pro His Gly Leu Gly Leu Ser lie Leu Cys Pro Asp Arg Arg Leu Val 
35 40 45 

Thr Pro Pro Val Ala lie Phe Cys Leu Pro Gly Thr Ala Met Gly Pro 
50 55 60 

Asn Ala Ser Ser Ser cys Pro Gin His Pro Ala Ser Leu Ser Gly Thr 
65 70 75 80 

Trp Thr Val Tyr Pro Asn Gly Arg Phe Gly Asn Gin Met Gly Gin Tyr 

85 90 95 

Ala Thr Leu Leu Ala Leu Ala Gin Leu Asn Gly Arg Arg Ala Phe lie 
100 105 110 

Leu Pro Ala Met His Ala Ala Leu Ala Pro Val Phe Arg lie Thr Leu 
115 120 125 

!«'! Pro Val Leu Ala Pro Glu Val Asp Ser Arg Thr Pro Trp Arg Glu Leu 

13 0 13 5 140 

— - - G i n Leu -His Asp Trp Met— Ser-Glu-G-l-u -T yr- A la— Asp - Leu- Arg Asp _Prq_ 
! =M 145 150 155 160 

□ Phe Leu Lys Leu Ser Gly Phe Pro Cys Ser Trp Thr Phe Phe His His 
J| 165 170 175 

.g Leu Arg Glu Gin lie Arg Arg Glu Phe Thr Leu His Asp His Leu Arg 

□ 180 185 190 

Glu Glu Ala Gin Ser Val Leu Gly Gin Leu Arg Leu Gly Arg Thr Gly 
195 200 205 

Asp Arg Pro Arg Thr Phe Val Gly Val His Val Arg Arg Gly Asp Tyr 
210 215 220 

Leu Gin val Met Pro Gin Arg Trp Lys Gly Val Val Gly Asp Ser Ala 
225 230 ~ 235 240 

Tyr Leu Arg Gin Ala Met Asp Trp Phe Arg Ala Arg His Glu Ala Pro 

245 250 255 

Val Phe Val Val Thr Ser Asn Gly Met Glu Trp Cys Lys Glu Asn lie 
260 265 270 

Asp Thr Ser Gin Gly Asp Val Thr Phe Ala Gly Asp Gly Gin Glu Ala 
2 75 280 285 
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Thr Pro Trp Lys Asp Phe Ala Leu Leu Thr Gin Cys Asn His Thr lie 
290 295 300 

Met Thr lie Gly Thr Phe Gly Phe Trp Ala Ala Tyr Leu Ala Gly Gly 
305 310 315 320 

Asp Thr Val Tyr Leu Ala Asn Phe Thr Leu Pro Asp Ser Glu Phe Leu 

325 330 335 

Lys lie Phe Lys Pro Glu Ala Ala Phe Leu Pro Glu Trp Val Gly lie 
340 345 350 

Asn Ala Asp Leu Ser Pro Leu Trp Thr Leu Ala Lys Pro 
355 360 365 

(2) INFORMATION FOR SEQ ID NO: 7: 

_ (i) SEQUENCE CHARACTERISTICS: 

■j* (A) LENGTH: 3 647 base pairs 

! ~ (B) TYPE: nucleic acid 

! :H (C) STRANDEDNESS: unknown 

H! (D) TOPOLOGY: unknown 

: - t 

; = P (ii) MOLECULE TYPE: DNA (genomic) 

iJI (iv) ANTI-SENSE: NO " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CTGCAGAGAG CGCCACCCGG AAGCCACTTT TATAGAAGCT TTTACACACA ATGCTTGATT Si 

TTTTTTTTTT TTTTCCGAGA CGGAGTCTCG CTTTGTCGCC CAGGCTGGAG TGCAGTGGCG 12< 

CGATCTGGGC TCACTGCAAG CTCCGCCTCC TGGGTTGACG CCATTCTCCT GCCTCAGCTT 181 

CCCGAGTAGC TGGGACTACA GGCGCCCGCC ACCAAGCCTG GCTAATTTTT TTTTATTTTT 24< 

AGTGGAGACA GAGTTTCACC GTGTTAGCCA GGATGGTCTC "GATCTCCTGA CCTCGGGATC 301 

CGCCCGCCTC GGCCTCCCAA AGTGCTGGGA GTATAGGCGT GAGCCACCGC GCCTGGCCTA 36< 

TACTTGATTT TTAATGAAAA CATTCTTAAA TTCATATGGC TAACGCAAAT TTATTTTCTG 42i 

TAGGCATAAC ATCAAAAACA CCTGGCAGGA CTGCCCCATT CCCAGCACTG TCTAGTTCTC 48 

CCCTAGTATC AGTGGGACTC CACTGATGCA CAGCTGTGAT CTACTAAAAC TTCTCTCAAA 54- 

ACTTTCTCCT CTCCTTAGGT CAGCAGCCCC GCCCCTGATC TATTTGGAAA TCCCCTGAAT 60 
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AAAAGTTGAA 


TATCATAAAC 


CAAAGCGAAC 


ACPPAft A A AT 
*Aw w LnVjnnn X 


TPAAATTPAA 
X LilAAl x win 


ww ww X AwwX A 




AAAAATTTCT 


CAAGTGACTG 


TAGACTiTAfiA 


X w X w X wwAw X 


PTPftPPTH AT 1 
w X ww ww X AAX 


TV APPT 1 ?! P Tl TV 

AAww X AwAAG 


7 2 


AGGCCAGTGC 


GATACTGTCT 


X XAw^lWWWX X 


nAL X X wwwXw 


wXAwAAXAX X 


x 'P p*p^pp*pp 
XAX wX X wGX w 


/ O 


ATCATTTTAT 


CATCCAAACT 


A TTTTH PfiTl 
Allll UWil A 


Aw XXX LAI ww 


w X ww^AwAAAA 


X G'X"X"X"l"x"l"AA 


Q A 


GTGCTTGGTA 


AAATTAAT A G 

****** X X XIXX X X* W 


TPi AT* A TTP 7A T 
1 X A x x \~A x 




O 7A f^HPf TV TV TV 

wA w X (jAAUiU 


PPTl Tt ro TV TV m <|H|| 

G CAATAAAlT 


90 


C CTTG ACG AC 


*l^*XJVJV7V~ W X X W 




CAT CTT CAT C 


TTTGGTTTAT 


GAGTCCTGTG 


96 


CGTCTTGGTA 




inLIill wAww 


oppptv taptotv 


GACTTATTTG 


GTAGGGGACC 


1U2 


AAAGGAAAGA 


ACATGTTTTC 

w^k X VJ X X X X w 


A TTP CP X tv a 


AAACATTTTG 


ITCTCTAICC 


1 1 ll III II TV / *tm* 0*0+ 0m 

TTTACTGGGC 


108 


TGGCAGGCAA 


AGGAAATGTT 


f^mrp TV Hp/-* TV f 0% » 


CTuACATTGA 


TV TV TV t *'I"T*7V TV 

AAAlTTAAGT 


fTl /**« T\t f 1 ^ T» "» 

TCTTCACCAA 


1X4 


ATGCAGAGAC 


TCTGAAGGrr 


rlwwwww w x ww 


wwwwX wCCTC 


P* TV TA TV fTWPPC TV 

CACAATT C G A 


CCGTLTCGGC 


120 


GGGCCACGAG 


ATCCTGGrCA 

** X WW X VJ VJ W W £\ 


wwwAX ww ww X 




CTG CT CG CAC 


GTTCCCCCGG 


12o 


CCTCTGGACT 


wwwx WWW X WW 


w X LAA X w w w X 


/"^ /"^ 1 1 0+ 0*r* 0*r* 

CCCTCCGGCG 


GGCGTCGCTG 


GCGGGTGGCT 


132 


AGGCCCAACG 


GCAGGAAGPP 

w W*"\WWXV*W W W 


bALUL X A X ww 


TCCGTTCCGC 


GGCGGlGGGT 


CCGCCTTCCG 


138 


TCTGTTCTAG 


w WW W X WW X WW 


X www www UAb 


w r w CTTTAGA 


TV /"» rm/vnpp TV P 

AGGTtlwGAG 


*4| l^/'«f ft/ Mil TV ^ 

C LTx C GTGTAu 


1 A A 

144 


CTTCCCAGGG 


ATGAACCGGG 


CCTTCCCTCT 


GGAAGGCGAG 


GGTTuGGGtU 


APTiPTPlP-PP 


13U 


AGGGCCAGGG 


CGGTGGGCGC 


GCGCAGAGGG 


AAACCGGATC 


Aw X X wAwAwA 


C TV TV T^r^TV TV TV 

wAA X wAAuau 




TAGCGGATGA 


GG CGCTTGTG 


GGGCGCGGCC 


CGGAAGCCCT 


WwwwwwwwwO 


r*rn0*r*f* ta r* tv tv 
w X w G w A w AA w 


1 ISO 
XDZ 


GAGTGGGCGG 


AGGCGCCGCA 


GGAGGCTCCC 


GGGGCCTGGT 


CGGGCCGGCT 


GGGGGGuGGG 


loo 


CGCAGTGGAA 


GAAAGGGACn 


GGCGGTGCCC GGTTGGGCGT 


CCTGGCCAGC 


T CAC CTTG CC 


174 


CTGGCGGCTC 


GCCCCGCCCG 


GCACTTGGGA 


GGAGCAGGGC 


AGGGCCCGCG 


^^^^^^^^^^^ 

GCLlTx GCAl 


180 


TCTGGGACCG 


CCCCCTTCCA 


TTCCCGGGCC 


AGCGGCGAGC 


ww wVA w i» wAw w 


w w X wwAVvWWw 


lOO 


CAGCTACAGC 


ATGAGAGCCG 


GTGCCGCTCC 


TCCACGCCTG 


CGGACGCGTG 


GCGAGCGGAG 


192 


GCAGCGCTGC 


CTGTTCGCGC 


CATGGGGGCA 


CCGTGGGGCT 


CGCCGACGGC 


GGCGGCGGGC 


198 


GGGCGGCGCG 


GGTGGCGCCG 


AGGCCGGGGG 


CTGCCATGGA 


CCGTCTGTGT 


GCTGGCGGCC 


204 


GCCGGCTTGA 


CGTGTACGGC 


GCTGATCACC 


TACGCTTGCT 


GGGGGCAGCT 


GCCGCCGCTG 


210 


CCCTGGGCGT 


CGCCAACCCC 


GTCGCGACCG 


GTGGGCGTGC 


TGCTGTGGTG 


GGAGCCCTTC 


216 
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GGGGGGCGCG 


ATAGCGCCCC 


GAGGCCGCCC 


CCTGAGTGGG 
w w x vjav* x w w w 


WWWX WWWWX X 


wAAwA X w\Aww 


o o ^ r 


GGCTGCCGCC 


TGCTCACCGA 


CCGCGCGTCC 


TAGGGAGAGG 
x nuuununuvj 


fT G A GG G G GT 

W X wlww W W W X 


ww X X X X ww\Aw 


*5 *5 q r 


CACCGCGACC 


TCGTGAAGGG 


GCCCCCCGAC 


x wV9V«V«CwLuU 


G CTfZCZTZCZr* A *T 
w w X uuuuV#n X 


w wAww wwvJAt* 




ACTGCCGAGG 


AGGTGG AT CT 


G CG CGTGTTG 




AGGGAGGfff* 
Awwwawwwww 


wwwwww^AwAA. 


"5 A €\ f 


GCCCTGGCGA 


CCTCCAGCCC 


CAGGCCCCCG 




www X X X wwAX 


^ TV TV fWIHll^f^ TV 

wAAw X X ww Aw 




TCGCCCTCGC 


ACTCCCCGGG 


GCTGCGAAGC: 


w x uuLaaVj x A 


A w w X w X X LAA 


w X wwAw w v7iC 


O CEO f 


TCCTACCGGG 


CGGACTCGGA 


CGTCTTTCTC 

«vj x x« x x x \j x ^3 


w w x XAXwwwX 


AwwX wX Aw ww 


TV ^* TV 

wAwAAw w wiu 




CCCGGCGACC 


CGCCCTCAGG 


V» X VJ V, i 7 


V_ wAw X w JL WLn 


WwAAAwAwWW 


ww X ww X ww\JA 


v 


TGGGTGGTGA 


GCCACTGGGA 


CGACCGCCAG 




C PTi Ppl 
wwX Aw X ALLA 


w wAAw X AGC 


Z /LM 


CAACATGTGA 


CCGTGGACGT 


VJ X X WWWWWWW 


WW WW WW WW WW 


wwwAww www X 


r*r*f^ tv tv tv f iwh 


*5 *T ^ j 


GGGCTCCTGC 


ACACAGTGGC 






/""»»Tir?»iHf-if** TV f TV TV 


/^ff^^^^TV ^^1V ^ 


2S2( 


CTGGATTATA 


TCACCGAGAA 




AAww ww X 1 bL 


lvJGCTGGGGC 


GGTGCCGGTG 


288( 


GTGCTGGGCC 


CAGACCGTGC 


C A A GT A G G A tl 


wwwX X XuIvjC 


lllGGGGGGG 


/■>fTHl 1^ TV on^^iv ^ 

GTTCATCCAC 


294< 


GTGGACGACT 


TCCCAAGTGC 


CTCCTCCCTG 




1 wwX XXX ww X 


/^/■^ TV f*r*f*r*TA. TAO 
wwAw ww wAAw 




CCCGCGGTCT 


ATCGCCGCTA 


CTTCCACTGG 


WW W wwwnw w X 


A ww w X w X w wA 


w A X l~Aww X ww 




TTCTGGGACG 


AGCCTTGGTG 


CCGGGTGTGC 


UAUU W X W X AW 


AG AGGfir^TGG 
Aw Aw www X ww 


nG A PPGGPPP 
wwAw wwwwww 


J xz 


AAGAGCATAC 


GGAACTTGGC 


C AG CT G GTT G 


G AGGGGTG A A 


WWWWWWW X WW 


c rTf!/! a a g 

ww X wwAAw ww 


J lo : 


ACCCAGGGGA 


GCCCAAGTTG 


TCAGCTTTTT 


GATCCTCTAC 


X wX uUil wX w 


w X X wAw X www 




GCATCATGGG 


AGTAAGTTCT 


TCAAACACCC 


ATTTTTGCTC 


f PA r Pf*Gf* A 71 71 71 
X AX wwwAAAA 


TV TV TA f^f^. TV T^^n^PTA 
AAAwwAX X X A 


JJU 


CCAATTAATA 


TTACTCAG C A 


CAGAGATGGG 


GGCCCGGTTT 


^^^^ T\ Ti ^p^rt^O^p^p 

w WAX AX X X X X 


X w wAwAww X A 


J JO 


wwAAX Xwwww 


TCCCTTTGCT 


GCTGATGGGC 


ATCATTGTTT 


AGGGGTGAAG 


GAGGGGGTTC 


342 


TTCCTCACCT 


TGTAACCAGT GCAGAAATGA AATAGCTTAG 


CGGCAAGAAG 


CCGTTGAGGC 


348 


GGTTTCCTGA 


ATTTCCCCAT 


CTGCCACAGG 


CCATATTTGT 


GGCCCGTGCA 


GCTTCCAAAT 


354 


CTCATACACA 


ACTGTTCCCG 


ATTCACGTTT 


TTCTGGACCA 


AGGTGAAGCA 


AATTTGTGGT 


360 


TGTAGAAGGA 


GCCTTGTTGG 


TGGAGAGTGG 


AAGGACTGTG 


1 GCTGCAG 




364 
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(2) INFORMATION FOR SEQ 3D NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 05 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPEi protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Gly Ala Pro Trp Gly Ser Pro Thr Ala Ala Ala Gly Gly Arg Arg 
1 5 10 15 

Gly Trp Arg Arg Gly Arg Gly Leu Pro Trp Thr Val Cys Val Leu Ala 

.=> 20 25 30 

Ala Ala Gly Leu Thr Cys Thr Ala Leu lie Thr Tyr Ala Cys Trp Gly 
^ 35 40 45 



Gin Leu Pro Pro Leu Pro Trp Ala Ser Pro Thr Pro Ser Arg Pro Val 
50 55 60 



Gly Val Leu Leu Trp Trp Glu Pro Phe Gly Gly Arg Asp Ser Ala Pr 
65 70 75 80 

Arg Pro Pro Pro Asp. cys Pro Leu Arg Phe Asn lie Ser Gly Cys Arg 

85 90 95 

Leu Leu Thr Asp Arg Ala Ser Tyr Gly Glu Ala Gin Ala Val Leu Phe 
100 105 110 

His His Arg Asp Leu Val Lys Gly Pro Pro Asp Trp Pro Pro Pr Trp 
115 120 125 

Gly lie Gin Ala His Thr Ala Glu Glu Val Asp Leu Arg Val Leu Asp 
130 135 140 

Tyr Glu Glu Ala Ala Ala Ala Ala Glu Ara Leu Ala Thr Ser Ser Pro 
145 150 155 160 

Arg Pro Pro Gly Gin- Arg Trp Val Trp Met Asn Phe Glu Ser Pr Ser 

165 170 175 

His Ser Pro Gly Leu Arg Ser Leu Ala Ser Asn Leu Phe Asn Trp Thr 
180 185 190 

Leu Ser Tyr Arg Ala Asp Ser Asp Val Phe Val Pro Tyr Gly Tyr Leu 
195 200 205 
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Tyr Pro Arg Ser His Pro Gly Asp Pro Pro Ser Gly Leu Ala Pro Pro 
210 215 220 

Leu Ser Arg Lys Gin Gly Leu Val Ala Trp Val Val Ser His Trp Asp 
225 230 235 240 

Glu Arg Gin Ala Arg Val Arg Tyr Tyr His Gin Leu Ser Gin His Val 

245 250 255 

Thr Val Asp Val Phe Gly Arg Gly Gly Pro Gly Gin Pro Val Pro Glu 
260 265 270 

lie Gly Leu Leu His Thr Val Ala Arg Tyr Lys Phe Tyr Leu Ala Phe 
275 280 285 

Glu Asn Ser Gin His Leu Asp Tyr lie Thr Glu Lys Leu Trp Arg Asn 
290 295 300 

Ala Leu Leu Ala Gly Ala Val Pro Val Val Leu Gly Pro Asp Arg Ala 
305 310 315 320 

Asn Tyr Glu Arg Phe Val Pro Arg Gly Ala Phe lie His Val Asp Asp 

325 330 335 

Phe Pro Ser Ala Ser— Ser- Leu Ala Ser Tyr Leu Leu Phe Leu -Asp Arg 
340 345 350" 

Asn Pro Ala Val Tyr Arg Arg Tyr Phe His Trp Arg Arg Ser Tyr Ala 
355 360 365 

Val His lie Thr Ser Phe Trp Asp Glu Pro Trp Cys Arg Val Cys Gin 
370 375 380 

Ala Val Gin Arg Ala Gly Asp Arg Pro Lys Ser lie Arg Asn Leu Ala 
385 390 395 400 

Ser Trp Phe Glu Arg 

405 

INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1488 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATGGGGGCAC CGTGGGGCTC GCCGACGGCG GCGGCGGGCG GGCGGCGCGG GTGGCGCCGA 60 

GGCCCGGGGC TGCCATGGAC CGTCTGTGTG CTGGCGGCCG CCGGCTTGAC GTGTACGGCG 120 

CTGATCACCT ACGCTTGCTG GGGGCAGCTG CCGCCGCTGC CCTGGGCGTC GCCAACCCCG 180 

TCGCGACCGG TGGGCGTGCT GCTGTGGTGG GAGCCCTTCG GGGGGCG CG A TAGCGCCCCG 240 

AGGCCGCCCC CTGACTGCTG CTGGGGGCAG CTGCCGCCGC TGCCCTGGGC GTCGCCAACC 300 

CCGTCGCGAC CGGTGGGCGT GCTGCTGTGG TGGGAGCCCT TCGGGGGGCG CGATAGCGCC 360 

CCGAGGCCGC CCCCTGACTG CCCGCTGCGC TTCAACATCA GCGGCTGCCG CCTGCTCACC 42C 

GACCGCGCGT CCTACGGAGA GGCTCAGGCC GTGCTTTTCC ACCACCGCGA CCTCGTGAAG 48C 

^3 GGGCCCCCCG ACTGGCCCCC GCCCTGGGGC ATCCAGGCGC ACACTGCCGA GCCGCTGCGC 54( 

iiO TTCAACATCA GCGGCTGCCG CCTGCTCACC GACCGCGCGT CCTACGGAGA GGCTCAGGCC 60C 

UJ GTGCTTTTCC ACCACCGCGA CCTCGTGAAG GGGCCCCCCG ACTGGCCCCC GCCCTGGGGC 661 

-J ATCCAGGCGC ACACTGCCGA GGAGGTGGAT CTGCGCGTGT TGGACTACGA GGAGGCAGCG 721 

m 

GCGGCGGCAG AAGCCCTGGC GACCTCCAGC CCCAGGCCCC CGGGCCAGCG CTGGGTTTGG 78i 

\P t ATGAACTTCG AGTCGCCCTC GCACTCCCCG GGGCTGCGAA GCCTGGCAAG TAACCTCTTC 84< 

: | AACTGGACGC TCTCCTACCG GGCGGACTCG GACGTCTTTG TGCCTTATGG CTACCTCTAC 90< 

j;J CCCAGAAGCC ACCCCGGCGA CCCGCCCTCA GGCCTGGCCC CGCCACTGTC CAGGAAACAG 96' 

GGGCTGGTGG CATGGGTGGT GAGCCACTGG GACGAGCGCC AGGCCCGGGT CCGCTACTAC 102 

CACCAACTGA GCCAACATGT GACCGTGGAC GTGTTCGGCC GGGGCGGGCC GGGGCAGCCG 108 

GTGCCCGAAA TTGGGCTCCT GCACACAGTG GCCCGCTACA AGTTCTACCT GGCTTTCGAG 114 

AACTCGCAGC ACCTGGATTA TATCACCGAG AAGCTCTGGC GCAACGCGTT GCTCGCTGGG 120 

GCGGTGCCGG TGGTGCTGGG CCCAGACCGT GCCAACTACG AGCGCTTTGT GCCCCGCGGC 126 

GCCTTCATCC ACGTGGACGA CTTCCCAAGT GCCTCCTCCC TGGCCTCGTA CCTGCTTTTC 132 

CTCGACCGCA ACCCCGCGGT CTATCGCCGC TACTTCCACT GGCGCCGGAG CTACGCTGTC 13 8 

CACATCACCT CCTTCTGGGA CGAGCCTTGG TGCCGGGTGT GCCAGGCTGT ACAGAGGGCT 144 

GGGGACCGGC CCAAGAGCAT ACGGAACTTG GCCAGCTGGT TCGAGCGG 14 € 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1316 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



TTTATGACAA 


GCTGTGTCAT 


AAATTATAAC 


AGCTTCTCTC 


AGGACACTGT 


GGCCAGGAAG 


6C 


TGGGTGATCT 


TCCTTAATGA 


CCCTCACTCC 


TCTCTCCTCT 


CTTCCCAGCT 


ACTCTGACCC 


12C 


ATGGATCCCC 


TGGGCCCAGC 


CAAGCCACAG 


TGGCTGTGGC 


GCCGCTGTCT 


GGCCGGGCTG 


18C 


CTGTTTCAGC 


TGCTGGTGGC 


TGTGTGTTTC 


TTCTCCTACC 


TGCGTGTGTC 


CCGAGACGAT 


24C 


GCCACTGGAT 


CCCCTAGGCC 


AGGGCTTATG 


GCAGTGGAAC 


CTGTCACCGG 


GGCTCCCAAT 


30C 


GGGTCCCGCT 


GCCAGGACAG 


CATGGCGACC 


CCTGCCCACC 


CCACCCTACT 


GATCCTGCTG 


36C 


TGGACGTGGC 


CTTTTAACAC 


ACCCGTGGCT 


CTGCCCCGCT 


GCTCAGAGAT 


GGTGCCCGGC 


42C 


GCGGCCGACT 


GCAACATCAC 


TGCCGACTCC 


AGTGTGTACC 


CACAGGCAGA 


CGCGGTCATC 


48C 


GTG C A C C ACT 


GGGATATCAT 


GTACAACCCC 


AGTGCCAACC 


TCCCGCCCCC 


CACCAGGCCG 


54C 


CAGGGGCAGC 


GCTGGATCTG 


GTTCAGCATG 


GAGTCCCCCA 


GCAACTGCCG 


GCACCTGGAA 


60C 


GCCCTGGACG 


GATACTTCAA 


TCTCACCATG 


TCCTACCGCA 


GCGACTCCGA 


CATCTTCACG 


66C 


CCCTACGGCT 


GGCTGGAGCC 


GTGGTCCGGC 


CAGCCTGCCC 


ACCCACCGCT 


CAACCTCTCG 


72( 


GCCAAGACCG 


AGCTGGTGGC 


CTGGGCGGTG 


TCCAACTGGA 


AGCCGGACTC 


GGCCAGGGTG 


78( 


CGCTACTACC 


AGAGCCTGCA 


GGCTCATCTC 


AAGGTGGACG 


TGTACGGACG 


CTCCCACAAG 


84( 


CCCCTGCCCA 


AGGGGACCAT 


GATGGAGACG 


CTGTCCCGGT 


ACAAGTTCTA 


TCTGGCCTTC 


901 


GAGAACTCCT 


TGCACCCCGA 


CTACATCACC 


GAGAAGCTGT 


GGAGGAACGC 


CCTGGAGGCC 


9Si 


TGGGCCGTGC 


CCGTGGTGCT 


GGGCCCCAGC 


AGAAGCAACT 


ACGAGAGGTT 


CCTGCCGCCC 


102< 


GACGCCTTCA 


TCCACGTGGA 


TGACTTCCAG 


AGCCCCAAGG 


ACCTGGCCCG 


GTACCTGCAG 


1081 


GAGCTGGACA 


AGGACCACGC 


CCGCTACCTG 


AGCTACTTTC 


GCTGGCGGGA 


GACGCTGCGG 


114" 
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CCTCGCTCCT TCAGCTGGGC ACTGGCTTTC TGCAAGGCCT GCTGGAAGCT GCAGCAGGAA 120C 

TCCAGGTACC AGACGGTGCG CAGCATAGCG GCTTGGTTCA CCTGAGAGGC CGGCATGGGG 12 6C 

CCTGGGCTGC CAGGGACCTC ACTTTCCCAG GGCCTCACCT ACCTAGGGTC TCTAGA 13 1C 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Asp Pro Leu Gly Pro Ala Lys Pro Gin Trp Leu Trp Arg Arg Cys 
1 5 10 15 

Leu Ala Gly Leu Leu Phe Gin Leu Leu Val Ala Val Cys Phe Phe Ser 
20 25 30 

Tyr Leu Arg Val Ser Arg Asp Asp Ala Thr Gly Ser Pro Arg~Pro Gly 
35 40 45 

Leu Met Ala Val Glu Pro Val Thr Gly Ala Pro Asn Gly Ser Arg Cys 
50 55 60 

Gin Asp Ser Met Ala Thr Pro Ala His Pro Thr Leu Leu lie Leu Leu 
65 70 75 80 

Trp Thr Trp Pro Phe Asn Thr Pro Val Ala Leu Pro Arg Cys Ser Glu 

85 90 95 

Met Val Pro Gly Ala Ala Asp Cys Asn lie Thr Ala Asp Ser Ser Val 
100 105 110 

Tyr Pro Gin Ala Asp Ala Val lie Val Hrs His Trp Asp He Met Tyr 
115 120 125 

Asn Pro Ser Ala Asn Leu Pro Pro Pro Thr Arg Pro Gin Gly Gin Arg 
130 135 140 

Trp He Trp Phe Ser Met Glu Ser Pro Ser Asn cys Arg His Leu Glu 
145 150 155 160 

Ala Leu Asp Gly Tyr Phe Asn Leu Thr Met Ser Tyr Arg Ser Asp Ser 

165 170 175 
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Asp lie Phe Thr Pro .Tyr Gly Trp Leu Glu Pro Trp Ser Gly Gin Pro 
180 185 190 

Ala His Pro Pro Leu Asn Leu Ser Ala Lys Thr Glu Leu Val Ala Trp 
195 200 205 

Ala Val Ser Asn Trp Lys Pro Asp Ser Ala Arg Val Arg Tyr Tyr Gin 
210 215 220 

Ser Leu Gin Ala Hi* .Leu Lys Val Asp Val Tyr Gly Arg Ser His Lys 
225 230 235 240 

Pro Leu Pro Lys Gly Thr Met Met: Glu Thr Leu Ser Arg Tyr Lys Phe 

245 : 250 255 

Tyr Leu Ala Phe Gin Asn Ser Leu His Pro Asp Tyr lie Thr Glu Lys 
260 265 270 

Leu Trp Arg Asn Ala Leu Glu Ala Trp Ala Val Pro Val Val Leu Gly 
275 280 285 

Pro Ser Arg Ser Asn Tyr Glu Arg Phe Leu Pro Pro Asp Ala Phe lie 
290 295 300 

His Val Asp Asp Phe Gin Ser Pro Lys Asp Leu Ala Arg Tyr Leu Gin 
\Jl 305 310 315 320 

□ Glu Leu Asp Lys Asp His Ala Arg Tyr Leu Ser Tyr Phe Arg Trp Arg 

n 325 330 335 

|. Glu Thr Leu Arg Pro Arg Ser Phe Ser Trp Ala Leu Ala Phe Cys Lys 

340 345 350 

! "* Ala Cys Trp Lys Leu Gin Gin Glu ser Arg Tyr Gin Thr Val Arg Ser 

355 360 365 

lie Ala Ala Trp Phe Thr 
370 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1086 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



ATGGATCCCP 


TCzn.n.Tn. p a cz p 

X wVj^lj X W W^~lv3 W 


P A AflPPAP A A 


X uVjULiAlbuL 


wCVwwV^XwXwX 


cc p pn p a crn 

uuv«v«uvwl\«lu 


fir 


CT attt P A *fi P 


X uLlbu X uuC 


IvjIvjIvjI I 1L 


X lLTILLlAuL 


X w IwlUlL 


PPP Aft A Pfi AT 




W W X W \ X 






fp^/^fi^r^/^f^TX, /"^ 

xwCTGGCGAC 


TV^/^TV^TA/^r^TVr* 

AuuAUiLUAv* 


tppp a r*r % r*r m r t 


lOl 


P P p A P P P T P p 


fTl^ TV /MM f^f T* 

lbAlLLlbLl 


Tv i"P/^0 TV ^ TV m/^ ^ 

A x GGACATGG 


CCTTTCCACA 




lLlbl LLLbL 




TCiTT P A A n A 


IbblVjLLLuu 


LACAGCCGAC 


T GCCACATCA 


C T G C CG AC L.G 


tv tv r^'P^np* o 
LAAbb X Lt X AC 




pPAPAftPPAp 


AuALbb x CA 1 


LU1 GGACCAC 


fjy^^^ tv mm m^TV 

TGGGATATCA 


TGTGuAALLL 


fTITV TV •7*^ TV f+f+r* 

x AAu X CACWsC 




PT P P P A P PT**T 


CCCCGAGIjCC 


^OTV/^O^/^r^TX 

G GAGGGGCAG 


CGCTGGATCT 


GGTTGAAL 1 1 


\j Vj A V» C CAC C C 




w w x *vAw luL>C 


AbwAUL X GGA 


AGCC UTGGAC 


iv ^* tv m tv Twn tv 

AG AT AC Tx CA 


TV m / *M 1 ^ TV a> tv 


\a X CC X ACCLvC 




nuLunL X www 


ALA 1L1X CAC 


GCCCTACGGC 


TGGCTGGAGC 


CGTGGTCCGG 


CCACCC X GCC 




V* x\w w LnL w \j C 


1 LAALL I C X U 


GGCCAAGACC 


GAGCTGGTGG 


CCTGGGCGGx 


G X CCAAC X GVi 


QUI 






GCGCTACTAC 


CAGAGCCTGC 


AGGCTCATCT 


O T\ TV /TTP TA 

CAAGGIGGAC 


OOi 


(ZTfZT A PPP A P 
vjIVj! nLu laAC 


t'+*T*r*f^f* TV TV TV 


GCCCCTGCCC 


AAGGGG AC CA 


TGATGGAGAC 


GCIG XCCCGii 


/ 


X ACAAlv 1 1L1 


^^^^^^^^^ ^^^^^^^^M 


tv ^ tv iv /'»m^^ 

CGAGAACTCC 


TTGCACCCCG 


ACTACATCAC 


t**f* TA TV TV t**T*^ 

CGAGAAGC1G 


7J)i 
/ oi 


TGGAGGAACG 


CCCTGGAGGC 


CTGGGCCGTG 


CCCGTGGTGC 


TGGGCCCCAG 


CAGAAGCAAC 


844 


TACGAGAGGT 


TCCTGCCACC 


CGACGCCTTC 


ATCCACGTGG 


ACGACTTCCA 


GAGCCCCAAG 


90< 


GACCTGGCCC 


GGTACCTGCA 


GGAGCTGGAC 


AAGGACCACG 


CCCGCTACCT 


GAGCTACTTT 


96< 


CGCTGGCGGG 


AGACGCTGCG 


GCCTCGCTCC 


TTCAGCTGGG 


CACTGGATTT 


CTGCAAGGCC 


102 


TGCTGGAAAC 


TGCAGCAGGA 


ATCCAGGTAC 


CAGACGGTGC 


GCAGCATAGC 


GGCTTGGTTC 


108 


ACCTGA 












108 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1654 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TTTTCTCATC TGTGAAACAG GAATAATAAC AGCTCTTCTC AGGACTCATG GCCTGGAGCT 60 

TTGGTAAGCA GGAGATTGTC ATCAATGACC CTCACTCCTC TCTCCCCACT TCCCAGAGAC 120 

TCTGACCCAT GGATCCCCTG GGCCCGGCCA AGCCACAGTG GTCGTGGCGC TGCTGTCTGA 180 

CCACGCTGCT GTTTCAGCTG CTGATGGCTG TGTGTTTCTT CTCCTATCTG CGTGTGTCTC 240 

AAGACGATCC CACTGTGTAC CCTAATGGGT CCCGCTTCCC AGACAGCACA GGGACCCCCG 300 

CCCACTCCAT CCCCCTGATC CTGCTGTGGA CGTGGCCTTT TAACAAACCC ATAGCTCTGC 360 

CCCGCTGCTC AGAGATGGTG CCTGGCACGG CTGACTGCAA CATCACTGCC GACCGCAAGG 420 

TGTATCCACA GGCAGACGCG GTCATCGTGC ACCACCGAGA GGTCATGTAC AACCCCAGTG 480 

QCCCAGCTCCC ACGCTCCCCG AGGCGGCAGG GGCAGCGATG GATCTGGTTC AGCATGGAGT 540 

irgCCCCAAGCCA CTGCTGGCAG CTGAAAGCCA TGGACGGATA CTTCAATCTC ACCATGTCCT 600 

tin 

LJACCGCAGCGA CTCCGACATC TTCACGCCCT ACGGCTGGCT GGAGCCGTGG TCCGGCCAGC 660 

^CTGCCCACCC ACCGCTCAAC CTCTCGGCCA -AGACCGAGCT-GGTGGCCTGG GCAGTGTCCA 720 

"I ! ACTGGGGGCC AAACTCCGCC AGGGTGCGCT ACTACCAGAG CCTGCAGGCC CATCTCAAGG 780 

j^TGGACGTGTA CGGACGCTCC CACAAGCCCC TGCCCCAGGG AACCATGATG GAGACGCTGT 840 

gCCCGGTACAA GTTCTATCTG GCCTTCGAGA ACTCCTTGCA CCCCGACTAC ATCACCGAGA 900 

J- AGCTGTGGAG GAACGCCCTG GAGGCCTGGG CCGTGCCCGT GGTGCTGGGC CCCAGCAGAA 960 

GCAACTACGA GAGGTTCCTG CCACCCGACG CCTTCATCCA CGTGGACGAC TTCCAGAGCC 1020 

CCAAGGACCT GGCCCGGTAC CTGCAGGAGC TGGACAAGGA CCACGCCCGC TACCTGAGCT 1080 

ACTTTCGCTG GCGGGAGACG CTGCGGCCTC GCTCCTTCAG CTGGGCACTC GCTTTCTGCA 1140 

AGGCCTGCTG GAAACTGCAG GAGGAATCCA GGTACCAGAC ACGCGGCATA GCGGCTTGGT 1200 

TCACCTGAGA GGCTGGTGTG GGGCCTGGGC TGCCAGGAAC CTCATTTTCC TGGGGCCTCA 1260 

CCTGAGTGGG GGCCTCATCT ACCTAAGGAC TCGTTTGCCT GAAGCTTCAC CTGCCTGAGG 1320 

ACTCACCTGC CTGGGACGGT CACCTGTTGC AGCTTCACCT GCCTGGGGAT TCACCTACCT 13 8 C 

GGGTCCTCAC TTTCCTGGGG CCTCACCTGC TGGAGTCTTC GGTGGCCAGG TATGTCCCTT 14 4 C 

ACCTGGGATT TCACATGCTG GCTTCCAGGA GCGTCCCCTG CGGAAGCCTG GCCTGCTGGG 150C 



• 
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GATGTCTCCT GGGGACTTTG CCTACTGGGG ACCTCGGCTG TTGGGGACTT TACCTG CTGG 1560 

GACCTGCTCC CAGAGACCTT CCACACTGAA TCTCACCTGC TAGGAGCCTC ACCTGCTGGG 162 C 

GACCTCACCC TGGAGGCACT GGGCCCTGGG AACT 1654 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 59 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asp Pro Leu Gly Pro Ala Lys Pro Gin Trp Ser Trp Arg Cys Cys 
1 5 10 15 



yj Leu Thr Thr Leu Leu Phe Gin Leu Leu Met Ala Val Cys Phe Phe Ser 

=P 20 25 30 



;r} Tyr Leu Arg Val Ser Gin Asp Asp Pro Thr Val Tyr Pro Asn Gly Ser 

35 40 45 

jn Ar< ? ph « Pro Asp Ser Thr Gly Thr Pro Ala His Ser lie Pro Leu lie 

fij 50 55 60 

"£ Leu Leu Trp Thr Trp Pro Phe Asn Lys Pro lie Ala Leu Pro Arg Cys 

IT 65 70 75 80 

Ser Glu Met Val Pro Gly Thr Ala Asp Cys Asn lie Thr Ala Asp Arg 

85 90 95 

Lys Val Tyr Pro Gin Ala Asp Ala Val lie Val His His Arg Glu Val 
100 105 110 

Met Tyr Asn Pro Ser Ala Gin Leu Pro Arg Ser Pro Arg Arg Gin Gly 
115 120 125 

Gin Arg Trp lie Trp Phe Ser Met Glu Ser Pro Ser His Cys Trp Gin 
130 135 140 

Leu Lys Ala Met Asp Gly Tyr Phe Asn Leu Thr Met Ser Tyr Arg Ser 
145 150 155 160 

Asp Ser Asp lie Phe Thr Pro Tyr Gly Trp Leu Glu Pro Trp Ser Gly 

165 170 175 



Gin Pro Ala His Pro Pro Leu Asn Leu Ser Ala Lys Thr Glu Leu Val 
180 185 190 

Ala Trp Ala Val Ser Asn Trp Gly Pro Asn Ser Ala Arg Val Arg Tyr 
195 200 205 

Tyr Gin Ser Leu Gin Ala His Leu Lys Val Asp Val Tyr Gly Arg Ser 
21 ° 215 220 

His Lys Pro Leu Pro Gin Gly Thr Met Met Glu Thr Leu Ser Arg Tyr 
225 230 235 240 

Lys Phe Tyr Leu Ala Phe Glu Asn Ser Leu His Pro Asp Tyr II Thr 

245 250 255 

Glu Lys Leu Trp Arg Asn Ala Leu Glu Ala Trp Ala Val Pro Val Val 
260 265 270 

Leu Gly Pro Ser Arg Ser Asn Tyr Glu Arg Phe Leu Pro Pro Asp Ala 
275 280 285 

Phe He His Val Asp Asp Phe Gin Ser Pro Lys Asp Leu Ala Arg Tyr 
29 0 295 300 

Leu Gin Glu Leu Asp Lys Asp His Ala Arg Tyr Leu Ser Tyr Phe Arg 
305 310 • 315 320 

Trp Arg Glu Thr Leu Arg Pro Arg Ser Phe Ser Trp Ala Leu Ala Phe 

325 330 335 

cys Lys Ala Cys Trp Lys Leu Gin Glu Glu Ser Arg Tyr Gin Thr Arg 
340 345 350 



Gly lie Ala 
355 



Ala Trp Phe Thr 



